在許多系統中,會使用tag,或是屬性.
我們將介紹傳統的方式,以及PostgreSQL 的 array 兩種方式.
會建立三個table,一個用來放標的,一個用來放tag,一個用來放中介.
create table videos (
vid_id int not null generated always as identity primary key
, vid_number text not null
);
create table tags (
tag_id int not null generated always as identity primary key
, tag_name text not null
, unique(tag_name)
);
create table video_tags (
vid_id int not null
, tag_id int not null
, unique(vid_id, tag_id)
);
--測試資料
insert into videos(vid_number)
values ('MEYD-844'), ('WAAA-293');
insert into tags (tag_name)
values ('熟女'),('人妻'),('巨乳'),('泳裝'),
('單體作品'),('高清'),('獨家'),('4K'),
('女教師'),('痴女'),('大屁股'),('中出');
insert into video_tags values
(1, 1), (1, 2), (1, 3), (1, 4),
(1, 5), (1, 6), (1, 7), (1, 8),
(2, 9), (2, 10),(2, 11), (2, 12),
(2, 3), (2, 1), (2, 8), (2, 7);
select v.vid_number
from videos v
join video_tags vt
on (v.vid_id = vt.vid_id)
join tags t
on (vt.tag_id = t.tag_id)
where t.tag_name = '女教師';
vid_number
------------
WAAA-293
select v.vid_number
from videos v
join video_tags vt1
on (v.vid_id = vt1.vid_id)
join video_tags vt2
on (vt1.vid_id = vt2.vid_id)
join tags t1
on (vt1.tag_id = t1.tag_id)
join tags t2
on (vt2.tag_id = t2.tag_id)
where t1.tag_name = '熟女'
and t2.tag_name = '巨乳';
vid_number
------------
MEYD-844
WAAA-293
(2 rows)
select v.vid_number
from videos v
join video_tags vt1
on (v.vid_id = vt1.vid_id)
join video_tags vt2
on (vt1.vid_id = vt2.vid_id)
join video_tags vt3
on (vt2.vid_id = vt3.vid_id)
join tags t1
on (vt1.tag_id = t1.tag_id)
join tags t2
on (vt2.tag_id = t2.tag_id)
join tags t3
on (vt3.tag_id = t3.tag_id)
where t1.tag_name = '熟女'
and t2.tag_name = '巨乳'
and t3.tag_name = '泳裝';
vid_number
------------
MEYD-844
由上面的例子可以看到,查詢方式較為繁瑣.
這時只需要兩個 table,tags 還是一樣,將中介的 table video_tags
合併進 videos 產生新的 videos_array
create table videos_array (
vid_id int not null primary key
, vid_number text not null
, vid_tags int[]
);
insert into videos_array
select v.vid_id, vid_number, array_agg(tag_id)
from videos v
join video_tags
using (vid_id)
group by v.vid_id, vid_number;
select *
from videos_array;
vid_id | vid_number | vid_tags
--------+------------+----------------------
1 | MEYD-844 | {1,2,3,4,5,6,7,8}
2 | WAAA-293 | {1,3,7,8,9,10,11,12}
(2 rows)
--可以建立GIN index
create index on videos_array using gin (vid_tags);
---三個 tag 的情況
with t1 as MATERIALIZED (
select array_agg(tag_id) as tag_ids
from tags
where tag_name in ('熟女', '巨乳', '泳裝')
)
select va.vid_number
from videos_array va
join t1
on va.vid_tags @> t1.tag_ids;
vid_number
------------
MEYD-844
當我們需要查詢具有多個tag,只要善用array 的包含運算子 @>
以及 array_agg() 函數,透過 MATERIALIZED 型態 CTE(只計算一次),
即可方便撰寫SQL,執行速度又快,因為資料筆數精簡又不必透過中介table
多次join.
可以使用 unnest() 展開 array
select t.tag_name
from videos_array va
cross join unnest(va.vid_tags) as tag_id
join tags t
using (tag_id)
where va.vid_number = 'MEYD-844';
tag_name
----------
熟女
人妻
巨乳
泳裝
單體作品
高清
獨家
4K
(8 rows)
PostgreSQL的Array,是一個成熟並具有彈性,可以讓我們精簡查詢,減少空間,有效的反正規化,提高查詢速度的好東西,可以多加利用.
感謝佐山愛小姐.
本文同步發表於 https://hackmd.io/@pgsql-tw/rJSOHh8zT